8  Landscape Genetics

Session Presenters

Required packages

library(dartRverse)
library(ggplot2)
library(dismo)
library(leaflet)
library(htmltools)
library(leaflegend)
library(sf)
library(raster)

make sure you have the packages installed, see Install dartRverse

Introduction

Landscape genetics combines landscape ecology with population genetics to understand how the structure, composition and configuration of the landscape affects gene flow, genetic drift, and selection.

In this tutorial, we will dive into the first question - how does the landscape influence gene flow? There are many ways to tackle this question, for example, we can explore gene flow at the individual or population level, over different spatial scales, and/or across a meta-population.

Here we’ll start with the basics, by looking at the spatial distribution of individuals, and build on this until we incorporate landscape features, to understand:

  1. Different dispersal strategies; and

  2. Limitations to gene flow, including:

    a) Geographic distance (isolation-by-distance);

    b) Geographic barriers (isolation-by-barrier);

    c) Landscape attributes (isolation-by-resistance).

1. Dispersal strategies

Many biological and demographic processes can shape patterns of genetic structure, for example, dispersal strategies, mating systems, and social behaviours. These processes often occur over a fine scale. As such, exploring patterns within populations over 10s-100s of metres can be very informative, especially in small and inconspicuous species.

Before diving into more complex and correlative analyses, it is useful to understand some of the baseline life-history and demographic traits in your species. One common question is: how far do individuals disperse and is this the same or different for males and females?

When one sex disperses further than the other, this is known as sex-biased dispersal. This can be detected via genetic spatial autocorrelation analysis. We’ll describe this in more detail later on, but in short, increased philopatry by one sex results in greater genetic similarity among neighbouring individuals, while genetic similarity is reduced in the dispersing sex. You can also learn more about it in Banks & Peakall 2012 (and references therein).

What is philopatry?

Natal philopatry describes when an animal remains close or returns to the area where they were born.

To explore the genetic patterns that result from sex-biased dispersal, we’re going to simulate this process in a hypothetical animal population. First, we need to set up the dispersal kernels, using the custom function below.

# Create a function, where: 
# x = minimum to maximum distance, 
# d0 = mean distance, 
# p= proportion  that go to mean
p2p <- function(x, d0, p)
{
  return (exp(((x/d0)*log(p))))
}

# Dispersal kernels:
# Females
dispfemale <- (p2p(x = 1:50, # range of distances
                   d0 = 1, # mean dispersal distance
                   p = 0.5)) # proportion that go to mean
pdispfemale <- c(0, cumsum(dispfemale)/sum(dispfemale))

# Males
dispmale <- (p2p(x = 1:50, # range of distances
                 d0 = 20, # mean dispersal distance
                 p = 0.5)) # proportion that go to mean
pdispmale <- c(0, cumsum(dispmale)/sum(dispmale))

Let’s look at the dispersal curves. In this case, males disperse further than females (i.e., there is male-biased dispersal):

# Plot
plot(dispfemale, type = "l", col = "#E69F00", lwd = 2, main = "Female dispersal curve",
    xlab = "Distance (m)", ylab = "")

plot(dispmale, type = "l", col = "#5D3FD3", lwd = 2, main = "Male dispersal curve",
    xlab = "Distance (m)", ylab = "")

Next we’ll create a function that generates a population using glSim, which simulates simple SNP data and returns a genlight object. In our hypothetical population, females randomly mate with the nearest male, produce two offspring at a 50:50 sex-ratio, and the offspring then disperse following the dispersal probabilities created above.

# Create a function, where: Nind = number of individuals, Nsnp = number
# of SNPs, pdispF = female dispersal probability (generated above)
# pdispM = male dispersal probability (generated above) seed = set the
# seed so simulation is repeatable
SimDisp <- function(Nind, Nsnp, pdispF, pdispM, seed) {

    set.seed(seed)

    # Simulate a genlight object
    gg <- glSim(n.ind = Nind, n.snp.nonstruc = Nsnp, grp.size = c(0.999,
        0.001), ploidy = 2, k = 1, LD = FALSE, ind.names = paste(sprintf("ind%03d",
        1:Nind), sep = ""), snp.names = paste(sprintf("snp%03d", 1:Nsnp),
        sep = ""))

    # Define pop
    pop(gg) <- rep("A", Nind)

    # Create all the other parameters
    gg <- gl.compliance.check(gg)

    # Define sex using a sex ratio of 0.5
    ds <- c(rep("F", 0.5 * Nind), rep("M", (1 - 0.5) * Nind))
    ds <- ds[order(runif(length(ds)))]
    gg@other$ind.metrics$sex <- factor(ds)

    # Set coordinates
    xy <- expand.grid(x = 1:100, y = 1:100)
    # sample from the grid
    xys <- xy[sample(1:nrow(xy), Nind, replace = FALSE), ]
    gg@other$ind.metrics$x <- xys$x
    gg@other$ind.metrics$y <- xys$y

    for (ii in 1:20) {
        # Run for 20 generations

        # Find mating pairs & reproduce
        females <- which(gg@other$ind.metrics$sex == "F")
        males <- which(gg@other$ind.metrics$sex == "M")

        off <- list()

        for (i in 1:length(females)) {
            mfemale <- gg[females[i], ]
            fxy <- c(gg@other$ind.metrics$x[females[i]], gg@other$ind.metrics$y[females[i]])
            mxy <- cbind(gg@other$ind.metrics$x[males], gg@other$ind.metrics$y[males])
            xxyy <- rbind(fxy, mxy)

            # Find closest mating male
            chosenm <- which.min(as.matrix(dist(xxyy))[1, -1])
            mmale <- gg[males[chosenm], ]

            doff <- gl.sim.offspring(mmale, mfemale, sexratio = 0.5, noffpermother = 2)  # 2 offspring
            doff@other$ind.metrics <- data.frame(sex = factor(c("F", "M")),
                x = mfemale@other$ind.metrics$x, y = mfemale@other$ind.metrics$y)

            off[[i]] <- doff

        }

        gg <- do.call(rbind, off)

        # Make all offspring adults
        xx <- (lapply(off, function(x) x@other$ind.metrics))
        xx <- do.call(rbind, xx)
        gg@other$ind.metrics <- xx

        # Emigrate depending on dispersial distance

        for (i in 1:nInd(gg)) {

            # Use dispersal curves generated above to determine
            # distance
            if (gg@other$ind.metrics$sex[i] == "M")
                dis <- max(which(runif(1) > pdispM)) else dis = max(which(runif(1) > pdispF))

            # Dispersal direction
            dir <- runif(1, 0, 2 * pi)

            # Assign new coordinates
            gg@other$ind.metrics$x[i] <- gg@other$ind.metrics$x[i] + dis *
                cos(dir)
            gg@other$ind.metrics$y[i] <- gg@other$ind.metrics$y[i] + dis *
                sin(dir)
        }

        # Use torus to determine what to do with individuals that
        # disperse outside of extent
        gg@other$ind.metrics$x <- gg@other$ind.metrics$x%%100
        gg@other$ind.metrics$y <- gg@other$ind.metrics$y%%100

        # Plot
        plot(gg@other$ind.metrics$x, gg@other$ind.metrics$y, pch = 16, cex = 1,
            col = c("#E69F00", "#5D3FD3")[as.numeric(gg@other$ind.metrics$sex)],
            asp = 1, main = paste("Generation", ii), xlab = "x", ylab = "y")
        legend("bottomleft", legend = c("Female", "Male"), col = c("#E69F00",
            "#5D3FD3"), pch = 16, cex = 1)

    }

    # Output genlight object containing simulated data
    return(gg)
}

Now run the function to simulate a population with 100 individuals, 1000 SNPs, and male-biased dispersal.

Note

The simulation goes for 20 generations - you can see an animation below

simdat <- SimDisp(Nind = 100, Nsnp = 1000, pdispF = pdispfemale, pdispM = pdispmale,
    seed = 123)

What does this mean for fine-scale genetic structure? Can we identify different dispersal patterns for males and females? Let’s run genetic spatial autocorrelation to find out. We’ll use the function gl.spatial.autoCorr. This is a multivariate approach that combines all loci into a single analysis (Smouse & Peakall 1999; Peakall et al. 2003; Banks & Peakall 2012). The autocorrelation coefficient “r” is calculated for each pair of individuals in each specified distance class (called “bins” in this function).

We’ll use evenly distributed bins and compare individuals at 10 m intervals up to 50 m.

Significance testing

There are two ways to test whether fine-scale genetic structure is significantly positive (i.e., individuals are genetically similar) or negative (i.e., individuals are genetically dissimilar).

The first approach is a one-tail permutation test, which provides 95% null hypothesis confidence regions. If the autocorrelation coefficient “r” falls outsdide of this range, significant fine-scale spatial genetic structure is present.

The second approach is to estimate 95% bootstrap confidence intervals (CIs) around the value for r. These are obtained by drawing (with replacement) from within the set of relevant pairwise comparisons for a given distance class. If bootstrap CIs do not overlap zero, fine-scale spatial genetic structure is present. Bootstrap CIs also allow you to test for significant differences between groups (e.g., between females and males). When CIs are non-overlapping, there is a significant difference.

I tend to use 95% bootstrap CIs, since they are more conservative than permutational tests and allow for comparisons. The gl.spatial.autoCorr function below will output 95% bootstrap CIs. You can try a permutation test by adding permutation = TRUE (note that for this option to work, you will need to create separate plots for each sex by choosing plot.pops.together = FALSE).

# Make 'sex' the population name
pop(simdat) <- simdat@other$ind.metrics$sex

# Add xy coordinates to the 'other' slot in genlight
simdat@other$latlon <- data.frame(lon = simdat@other$ind.metrics$x, lat = simdat@other$ind.metrics$y)
simdat@other$latlon <- Mercator(simdat@other$latlon, inverse = T)

# Run genetic spatial autocorrelation
gl.spatial.autoCorr(simdat, bins = seq(0, 50, 10), plot.pops.together = TRUE,
    plot.colors.pop = c("#E69F00", "#5D3FD3"))

We can see that:

  1. Both females and males show significant positive spatial autocorrelation up to 30 m (where confidence intervals overlap zero). In other words, once you start comparing individuals that are 30 m apart, they are unlikely to be related/genetically similar (regardless of sex).
  2. Females have significantly stronger fine-scale genetic structure than males up to 10 m (i.e., a greater “r” value and female and male bootstrap CIs are non-overlapping).

These results are not surprising, since we set our distance classes based on the known (simulated) dispersal capacity of males and females, and we had large sample sizes (both individuals and loci). In reality, this analysis is a careful balance between power (maximising the sample size in each bin), and ensuring you are looking at the right scale (i.e., the size of the distance class matches the extent of spatial-genetic structure).

Exercise

  1. What happens when you vary the dispersal distances? Can you pick up small differences between the sexes?

  2. What happens if you change the number of individuals or SNPs? Does this influence the sensitivity of the analysis? Which is more important - more individuals or more loci?

  3. What happens when you vary the size and number of distance classes? How does this interact with sample size? What do you think would happen if you used a single 50 m distance class - do we still see the signal of male-biased dispersal?

Re-run the code from the beginning

Try varying the following parameters:

  • Dispersal curves: “d0” and “p” when running the p2p function

  • Sample sizes: “Nind” (number of individuals) and “Nsnp” (number of loci) when running the SimDisp function

  • Distance classes: “bins” when running the gl.spatial.autoCorr function

2a. Isolation-by-distance

Isolation-by-distance (IBD) is a key concept in population genetics. It describes a simple idea: that geographic distance can influence gene flow because individuals or populations that are geographically distant from each other are less likely to share genetic material than those that are close (based on the “stepping-stone” model; Kimura & Weiss 1964).

We can test for IBD with Mantel tests of matrix correspondence (Mantel 1967), implemented using the function gl.ibd. This test compares pairwise geographic and genetic distance matrices, using permutations to test for significance. For the latter, you can use pairwise estimates of population differentiation or individual-by-individual genetic distances.

It’s easy enough to plot the relationship between geographic and genetic distance. However, we can’t use a standard regression to test this relationship because pairwise data are not independent. Mantel tests solve this problem by using permutation to test for significance (e.g., the data are randomly shuffled. The result will be similar to the shuffled outcome if there is no relationship present).

Populations vs. individuals

IBD tests often use pairwise population metrics like FST. This is useful when there are lots of samples and populations are well defined. However, there are often situations where we don’t have multiple samples per location, populations are difficult to define and/or individuals are more continuously distributed. In this case, we can use the individual as the unit of analysis.

Where FST estimates represent evolutionary averages, pairwise individual genetic distances may represent more contemporary patterns of dispersal. Therefore, the approach you take depends on the species, your sampling, and the question you want to ask.

Let’s test for IBD in our simulated data set:

total.ibd <- gl.ibd(simdat, distance = "euclidean", coordinates = "latlon")

# Show Mantel statistics
total.ibd$mantel

Mantel statistic based on Pearson's product-moment correlation 

Call:
vegan::mantel(xdis = Dgen, ydis = Dgeo, permutations = permutations,      na.rm = TRUE) 

Mantel statistic r: 0.08552 
      Significance: 0.013 

Upper quantiles of permutations (null model):
   90%    95%  97.5%    99% 
0.0512 0.0634 0.0767 0.0866 
Permutation: free
Number of permutations: 999

There is significant IBD across all individuals, but it’s not very pronounced. Can we still see the signal of male-biased dispersal? Let’s create separate IBD plots for females and males:

f.ibd <- gl.ibd(simdat[simdat@pop == "F"], distance = "euclidean", coordinates = "latlon")

m.ibd <- gl.ibd(simdat[simdat@pop == "M"], distance = "euclidean", coordinates = "latlon")

# Show Mantel statistics for females
f.ibd$mantel

Mantel statistic based on Pearson's product-moment correlation 

Call:
vegan::mantel(xdis = Dgen, ydis = Dgeo, permutations = permutations,      na.rm = TRUE) 

Mantel statistic r: 0.1438 
      Significance: 0.014 

Upper quantiles of permutations (null model):
   90%    95%  97.5%    99% 
0.0828 0.1053 0.1302 0.1476 
Permutation: free
Number of permutations: 999
# Show Mantel statistics for males
m.ibd$mantel

Mantel statistic based on Pearson's product-moment correlation 

Call:
vegan::mantel(xdis = Dgen, ydis = Dgeo, permutations = permutations,      na.rm = TRUE) 

Mantel statistic r: 0.02018 
      Significance: 0.375 

Upper quantiles of permutations (null model):
   90%    95%  97.5%    99% 
0.0825 0.1027 0.1199 0.1514 
Permutation: free
Number of permutations: 999
Mantel test statistics

Although a regression line is useful for visualising the relationship between geographic and genetic distance, we don’t report the results of the regression (for reasons outlined above). Instead, we report the Mantel test statistic (and the p-value), which is the correlation between the two distance matrices. Values that approach -1 indicate a strong negative relationship, values close to 0 suggest there is no relationship, and values approaching +1 indicate a strong positive relationship (i.e., IBD).

Above we can see that there is significant IBD in females, since dispersal is more restricted. Males show no pattern of IBD, since many individuals can disperse right up to the spatial extent of our sampling. However, we lose some of the nuance from the previous analysis.

What can we learn from each analysis?

Mantel tests show us the broad trend across our study. IBD often acts as our null hypothesis (along with panmixia - i.e., random mating and dispersal) against which to test further landscape genetic analyses. However, unless the signal of sex-biased dispersal is very strong across the whole dataset, a Mantel test is unlikely to detect it. Spatial autocorrelation analysis is usually more powerful in detecting fine-scale genetic patterns than Mantel tests, because samples are “binned” and genetic structure is explored across multiple distance classes.

Exercise
  1. What happens if you use the proportion of shared alleles instead of euclidean distance?

We expect a linear relationship between geographic distance (sometimes log-transformed) and genetic distance. A discontinuous relationship can be an indication that something else is going on. We will explore this idea below using a real example.

Lesser hairy-footed dunnarts in the Pilbara

Sminthopsis youngsonii

Load the data

load("Data/Sy.gl.RData")  # genetic data

Sminthopsis youngsonii (or the lesser hairy footed dunnart), is a small carnivorous dasyurid found across the Australian arid region. As the name suggests, their hind foot is covered in short, bristly hairs, which are thought to help them move on sandy soils. They are usually found on sand dunes, inter dune swale and red sand plains.

These samples were taken in the Pilbara region at the very western edge of their range. Let’s take a look at a map:

leaflet(Sy.gl@other$latlong) %>%
    addTiles() %>%
    addCircleMarkers(lng = ~lon, lat = ~lat, popup = ~htmlEscape(lon))

Given this is the edge of their range, and the Pilbara is quite rocky and doesn’t harbour much of the ideal habitat for this species, we want to know a little more about dispersal and connectivity in this area. A good first step is to test for isolation by distance.

Sy.ibd <- gl.ibd(Sy.gl, distance = "euclidean")

# Show Mantel statistics
Sy.ibd$mantel

Mantel statistic based on Pearson's product-moment correlation 

Call:
vegan::mantel(xdis = Dgen, ydis = Dgeo, permutations = permutations,      na.rm = TRUE) 

Mantel statistic r: 0.6513 
      Significance: 0.001 

Upper quantiles of permutations (null model):
   90%    95%  97.5%    99% 
0.0613 0.0897 0.1204 0.1594 
Permutation: free
Number of permutations: 999

Yes - there is significant IBD! The pattern is very strong. But… some of the points don’t seem to follow the same slope? Why might this be?

There seems to be a big gap in sampling. This could be due to a sampling bias (e.g., the missing area is difficult to get to). Alternatively, the species may not occur here, suggesting there might be something going on with the habitat.

Given what we know about the species, what would be a reasonable hypothesis?

Could substrate be driving this sampling gap and the strange IBD pattern?

Let’s separate the individuals into two populations using longitude. You can click on the points in the map above to choose a location.

# Choose a point that separates the samples into two populations
lon.break <- 114.87
# Assign eastern and western populations based on this point
Sy.gl@pop <- as.factor(ifelse(Sy.gl@other$latlong$lon < lon.break, "1.West",
    "2.East"))

Now let’s take another look at IBD, but this time we’ll include the population information.

gl.ibd(Sy.gl, distance = "euclidean", paircols = "pop")

It looks like the magnitude of genetic distance is different and there are two different slopes describing IBD within each ‘population’. The points with large geographic and genetic distances represent among population comparisons. Discontinuities like this can sometimes suggest a barrier to gene flow. Let’s take a look at the substrate.

# Load a soil shapefile
soil <- st_read("Data/Soil.shp")  # spatial data

# Create a colour palette for the two populations
pop.pal <- colorFactor(palette = c("#F8766D", "#00BFC4"), domain = Sy.gl@pop)

# Create a palette
soil.pal <- colorFactor(palette = c("#9C640C", "#5c3001", "#C0392B", "#EDC001",
    "#FDEFB2"), domain = soil$Type)
# Generate another map
leaflet(cbind(pop = Sy.gl@pop, Sy.gl@other$latlong)) %>%
    addTiles() %>%
    addPolygons(data = soil, weight = 0, fillOpacity = 0.7, color = ~soil.pal(Type)) %>%
    addCircleMarkers(lng = ~lon, lat = ~lat, color = ~pop.pal(pop)) %>%
    addLegendFactor(pal = soil.pal, shape = "rect", fillOpacity = 0.7, opacity = 0,
        values = ~soil$Type, title = "Soil type", position = "topright",
        data = gadmCHE, group = "Polygons")

It looks like clay could be acting as a barrier. This is a good theory, but it would be useful to be able to visualise these patterns of isolation by distance spatially to get an indication of where gene flow is being restricted. We’ll explore this idea in the next section.

2b. Isolation-by-barrier

Up to this point, we’ve tested for IBD - but could be helpful to visually inspect patterns of IBD across the landscape. We can do this with EEMS…

2c. Isolation-by-resistance

Up to this point, we’ve only visually inspected the landscape, could be useful to explicitly test the correlation between gene flow and landscape features.